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CONVERGENCE OF MULTI-BLOCK BREGMAN ADMM 
FOR NONCONVEX COMPOSITE PROBLEMS 

Fenghui Wang, Wenfei Cao, Zongben Xi|] 

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, 710049, China 

Abstract. The alternating direction method with multipliers (ADMM) has been one of most 
powerful and successful methods for solving various composite problems. The convergence of 
the conventional ADMM (i.e., 2-block) for convex objective functions has been justified for a 
long time, and its convergence for nonconvex objective functions has, however, been established 
very recently. The multi-block ADMM. a natural extension of ADMM, is a widely used scheme 
and has also been found very useful in solving various nonconvex optimization problems. It 
is thus expected to establish convergence theory of the multi-block ADMM under nonconvex 
frameworks. In this paper we present a Bregman modification of 3-block ADMM and establish 
its convergence for a large family of nonconvex functions. We further extend the convergence 
results to the (V-block case (N > 3), which underlines the feasibility of multi-block ADMM 
applications in nonconvex settings. Finally, we present a simulation study and a real-world appli¬ 
cation to support the correctness of the obtained theoretical assertions. 

Keywords: nonconvex regularization, alternating direction method, subanalytic function, K-L 
inequality, Bregman distance. 


1. Introduction 


Many problems arising in the fields of signal & image processing and machine learning 171541 
involve finding a minimizer of the sum of /V (/V > 2) functions with linear equality constraint. If 
N = 2, the problem then consists of solving 


min f(x) + g(y) 
s.t. Ax + By = 0 


( 1 ) 


where A e R mx,il and B e R'" x " 2 are given matrices, / : R' !| —» R is a proper lower semicontinu- 
ous function, and g : R” 2 —> R is a smooth function. Because of its separable structure, problem 
<0Q> can be efficiently solved by ADMM, namely, through the procedure 


x k+l = arg min L a (x,y k ,p k ) 

r(=R"l 


X€R"1 


y k + l = arg min L a (x k+] ,y,p k ) 


V€R"2 


( 2 ) 


pk+i _ pk + a (Ax k+1 + By k+1 ) 
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where a is a penalty parameter and 

L a (x,y, p) := fix) + g(y ) + (p,Ax + By) + ^\\Ax + By\\ 2 

is the associated augmented Lagrangian function with multiplier p. So far, various variants 
of the conventional ADMM have been suggested. Among such varieties, Bregman ADMM 
(BADMM) is the one designed to improve the performance of procedure ([2]) l(20l [42, 43] (56l l. 
More specifically, BADMM takes the following iterative form: 

x k+1 = arg min L a (x,y k ,p k ) + aJx, x k ) 
xeR"i 

y k+l = arg min L a (x k+l , y, p k ) + A^(y, y k ) (3) 

ye R "2 

p k+1 = p k + a{Ax k+1 + By k+1 ), 

where and A,/, are the Bregman distance with respect to functions cp and if/, respectively. 

ADMM was introduced in the early 1970s I2lll22ll . and its convergence properties for convex 
objective functions have been extensively studied. The first convergent result was established 
for strongly convex functions If2lll22ll . and then extended to general convex functions fTTlfTSll . 
It has been shown that ADMM can converge at a sublinear rate of <9(1 /k) I25lf36l . and 0( I /k 2 ) 
for the accelerated version l!23ll . The convergence of BADMM for convex objective functions 
has also been examined with the Euclidean distance lfT4l . Mahalanobis distance l56l . and the 
general Bregman distance ll56l . 

Recently, there has been an increasing interest in the study of ADMM for nonconvex objective 
functions. On one hand, the ADMM algorithm is highly successful in solving various noncon¬ 
vex examples ranging from nonnegative matrix factorization, distributed matrix factorization, 
distributed clustering, sparse zero variance discriminant analysis, polynomial optimization, ten¬ 
sor decomposition, to matrix completion (see e.g. HU [331 07J ED El). On the other hand, 
the convergence analysis of nonconvex ADMM is generally very difficult, due to the failure of 
the Fejer monotonicity of iterates. In l(27l . the subsequential convergence of ADMM for gen¬ 
eral nonconvex functions has been proved. Furthermore, the global convergence of ADMM for 
certain type of nonconvex functions has been proved in lf3Tl[44l . 

The purpose of the present study is to examine convergence of ADMM with 3 blocks (i.e., 
N = 3). The obtained results then can naturally be generalized to the case of ADMM with 
multiple blocks. Thus, in the present paper we first consider the following 3-block composite 
optimization problem: 


min /(*) + g(y) + h(z ) 

s.t. Ax + By + Cz~ 0 (4) 

where A e R mxw i, B e R"' x,!2 and C € R"' x "’’ are given matrices, / : R' !l —> R, g : R" 2 —> R are 
proper lower semicontinuous functions, and li : R" 3 —> R is a smooth function. To solve such a 
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problem, it is natural to extend the ADMM to the following form: 

' x k+1 = arg min L a (x,y k ,z k ,p k ) 

.veR"i 

y k+] = arg min L a (x k+l ,y,z k , p k ) 

yeR«2 (5) 

z k+1 = arg min L cr (x k+1 ,y k+1 ,z, p k ) 

ze R" 3 

p k+l = p k + a(Ax k+l + By k+l + Cz k+l ) 

where the augmented Lagrangian function L a : R'' x R' 12 x R" 3 x R"' —> R is defined by 

L a {x,y,z,p) := f(x) + g(y ) + h(z) + (p,Ax + By + Cz) + ^||Ax + By + Cz\\ 2 . (6) 

Unlike the conventional ADMM with 2 blocks, the convergence of algorithm ©, called the 3- 
block ADMM henceforth, has remained unclear even for convex objective functions. Although 
it is not necessarily convergent in general lfT3l . the 3-block ADMM does converge under some 
restrictive conditions; for example, under the strong convexity condition of all objective func¬ 
tions (see e.g. l25l ). Recently, Li, Sun, and Toh lf32l proposed a modification of algorithm ©, 
called the semi-proximal 3-block ADMM as follows 

x k+l = arg min L a (x,y k ,z k , p k ) + i||jt - x k \\ 2 r 
xeR"i 1 

y k+] = arg min L a (x k+l ,y,z k , p k ) + jlly-Zlll 

. yeR" 2 2 ^ 

z k+1 = arg min L a (x k+l ,y k+l ,z,p k ) + ill z - z k \\l 

IER"3 ' 12 

p k+1 = p k + a(Ax k+x + By k+l + Cz k+l ) 


where || ■ Hr, denotes ellipsoidal norms, i - 1,2,3. They proved the convergence of the algorithm 
when /, g, li are all convex and one of them is at least strongly convex. 

Motivated by Bregman ADMM, we propose to use the following 3-block Bregman ADMM 
for solving the optimization problem ©: 

x k+] = arg min L a (x,y k ,z k , p k ) + AJx,x k ) 
xeR"i 

}’ k+] = arg min L a (x k+1 ,y,z k , p k ) + A^(y,/) 

yeR " 2 (B) 

z k+] = arg min L a (x k+l ,y k+1 ,z, p k ) + A Jz,z k ) 

yeR"i 

p k+l = p k + a (Ax k+1 + By k+l + Cz k+1 ) 

where, as mentioned before, A^, A^ and are the Bregman distance associated with functions 
(p, ifj, and <p, respectively. In the present paper, our aim is to justify the convergence of 3-block 
BADMM under nonconvex frameworks. We will show that the 3-block BADMM can converge 
if the objective function is subanalytic and matrix C has full-row rank. 


2. Preliminaries 

In what follows, R' 1 will stand for the //-dimensional Euclidean space, 

n 

(x,y) -x T y = ^j x iyi . IWI = x/(x,x), 

i=t 
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where x, y £ R" and T stands for the transpose operation. 

2.1. Subdifferentials. Given a function / : R" —» R we denote by dom/ the domain of /, 
namely, dom/ := {x e R" : f(x) < +oo}. A function / is said to be proper if dom/ + 0; lower 
semicontinuous at the point xo if 

liminf /(x) > /(x 0 ). 

*->*0 

If / is lower semicontinuous at every point of its domain of definition, then it is simply called a 
lower semicontinuous function. 


Definition 2.1. Let f :R" — » R be a proper lower semi-continuous function. 

(i) Given x e dom/, the Frechet subdifferential of f at x, written by d f(x), is the set of all 
elements u £ R" which satisfy 

r . f f(y)-f(x)-(u,y-x) 
hm mf-----> 0. 

y*xy^>x ||x-y|| 

(ii) The limiting subdifferential, or simply subdifferential, of f at x, written by df(x'), is 
defined as 

df(x) = {u £ R" : 3x k —> x, /(x*) —> /(x), 
u k £ df{.rf) —> u,k —> oo}. 

(iii) A critical point or stationary point of f is a point x* in the domain of f satisfying 0 £ 
df(x*). 


Definition 2.2. An element w* := (x* ,y* ,z*, p ) is called a critical point or stationary point of 
the Lagrangian function L a defined as in © if it satisfies: 

( A T p* £ -<9/(x*), B T p* £ -dg(y*), 

| C T p* = -Vh(z*), Ax* + By* + Cz* =0. 

The existence of proper lower semicontinuous functions and properties of subdifferential can 
see 071 . We particularly collect the following basic properties of the subdifferential. 

Proposition 2.1. Let f : R n —> R and g : R" —» R be proper lower semi-continuous functions. 
Then the following holds: 

(i) df(x) c df{x) for each x £ R' ! . Moreover, the first set is closed and convex, while the 
second is closed, and not necessarily convex. 

(ii) Let (if, x k ) be sequences such that x k —> x, u k —* u, f(x k ) —> /(x) and u k £ df(x k ). Then 
u £ df(x). 

(iii) The Fermat’s rule remains true: if xo £ R” is a local minimizer of f, then xo is a critical 
point or stationary point of f, that is, 0 £ <9/(xo). 

(iv) If f is continuously differentiable function, then d( f + g)(x) = V/(x) + dg(x). 
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A function / is said to be lf-Lipschitz continuous (t/■ > 0) if 

\\f(x) - f(y)\\ < € f \\x - y||, 

for any x, y e dom/; p-strongly convex (p > 0 ) if 

f(y) > fix) + (f(x),y -x) + j\\y - x|| 2 , ( 10 ) 

for any x,y 6 dom/ and £(x) e <9/(x); coercive if 

lim f(x) - +oo. ( 11 ) 

Ml -* 00 

2.2. Kurdyka-Lojasiewicz inequality. The Kurdyka-Lojasiewicz (K-L) inequality was first in¬ 
troduced by Lojasiewicz If38l for real analytic functions, and then was extended by Kurdyka ||29l 
to smooth functions whose graph belongs to an o-min im al structure. Recently, this notion was 
further extended for nonsmooth subanalytic functions 0 |. 

Definition 2.3 (K-L inequality). A function f : R" —» R is said to satisfy the K-L inequality at xq 
if there exists q > 0 ,8 > 0 , ip e such that for all x € 0(x o, 6) fl [x : f(x o) < f(x) < /(xo) + q) 

ip'(fix) - /(x 0 ))dist( 0 , df(x)) > 1 , 

where dist(xo, df(x)) := inf{||.ro - >11 : y £ df(x)}, and stand for the class of functions 
ip : [0, 77 ) —» R + with the properties: (a) <p is continuous on [0, q); (b) ip is smooth concave on 
(0, q); (c) <p( 0) = 0, <p'(x) > 0, Vx € (0, q). 

The following is an extension of the conventional K-L inequality 0. 

Lemma 2.2 (K-L inequality on compact subsets). Let f : R" —» R be a proper lower semi- 
continuous function and let Q c R” be a compact set. If f is a constant on Q and f satisfies the 
K-L inequality at each point in L>, then there exists q > 0, <5 > 0, ip e such that for all xq £ Q 
and for all x e {.r 6 R" : dist(x, Q) < d)| fl {x e R" : f(x 0 ) < fix) < f(x 0 ) + q), 

P (fix) - /(x 0 ))dist( 0 , df(x)) > 1 . 

Typical functions satisfying the K-L inequality include strongly convex functions, real ana¬ 
lytic functions, semi-algebraic functions and subanalytic functions. 

A subset C c R" is said to be semi-algebraic if it can be written as 

r s 

C = [J P|{x e R" : gij(x) = 0 ,h itj (x) < 0 }, 

7=1 «=1 

where gij, hjj : R" —* R are real polynomial functions. Then a function / : R" —» R is called 
semi-algebraic if its graph 

Q(f) := {(x,y) e R' !+I : fix) = y) 

is a semi-algebraic subset in R' ,+1 . For example, the £ q norm ||.\|| ? := 2/ \ x i\ q with 0 < q < 1, the 
sup-norm ||x||oo := max, |x,|, the Euclidean norm ||x||, ||Ax - b\\ q q , || Ax - b\\ and ||Ax - b\\oo are all 
semi-algebraic functions for any matrix A II5II481. 
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A real function on R is said to be analytic if it possesses derivatives of all orders and agrees 
with its Taylor series in a neighborhood of every point. For a real function / on R", it is said to 
be analytic if the function of one variable g(t) := fix + ty) is analytic for any x,y e R". It is 
readily seen that real polynomial functions such as quadratic functions || Ax - b\\ 2 are analytic. 
Moreover, the e-smoothed ( q norm ||x|| Bi(y := £,-(x? + £ ) q ' 2 with 0 < q < 1 and the logistic loss 
function log(l + e~‘) are all examples for real analytic functions 1381 . 

A subset C c R" is said to be subanalytic if it can be written as 

r s 

C = |^J Pji* e R" : gij(x) = 0, hij(x) < 0}, 
j= 1 i=i 

where gij, hij : R" —> R are real analytic functions. Then a function / : R" —> R is called 
subanalytic if its graph Q{f) is a subanalytic subset in R' !+I . It is clear that both real analytic 
and semi-algebraic functions are subanalytic. Generally speaking, the sum of two subanalytic 
functions is not necessarily subanalytic. It is known, however, that for two subanalytic functions, 
if at least one function maps bounded sets to bounded sets, then their sum is also subanalytic, 

as shown in 008]. In particular, the sum of a subanalytic function and a analytic function is 

subanalytic. Some subanalytic functions that are widely used arc as follows: 

(i) \\Ax-b\\ 2 +A\\y\\ q - 

(ii) \\Ax-b\\ 2 + AZi(yj + sr /2 -, 

(iii) 7 , Z”=t iogC 1 + exp (~ c i(aJx + b)) + A\\y\\ q q , 

(iv) jf H" = i log(l + exp (-Ci(aJx + b )) + A £j(yf + s) q/2 . 

2.3. Bregman distance. The Bregman distance, first introduced in 1967 0, plays an impor¬ 
tant role in various iterative algorithms. As a generalization of squared Euclidean distance, the 
Bregman distance share many similar nice properties of the Euclidean distance. However, the 
Bregman distance is not a real metric, since it does not satisfy the triangle inequality nor sym¬ 
metry. For a convex differential function <p, the associated Bregman distance is defined as 

A^C x,y) = (p{x) - (f)(y) - {V(f>{y ), x - y>. 

In particular, if we let (f>(x) := J|x|| 2 in the above, then it is reduced to ||x - _y|p, namely, the 
classical Euclidean distance. Some nontrivial examples of Bregman distance include |[2j: 

(i) Itakura-Saito distance: 2/ x;(log x;/>’,■) - £ ( (x,- - v/); 

(ii) Kullback-Leibler divergence: £,■ x/(log x,/y/); 

(iii) Mahalanobis distance: ||x - y\\g = ( Qx , x) with Q a symmetric positive definite matrix. 
The following proposition collects some useful properties of Bregman distance. 

Proposition 2.3. Let d> be a convex differential function and Affx, y) the associated Bregman 
distance. 

(i) Non-negativity: A <p(x,y) > 0, A^(x, x) = 0 for all x,y. 

(ii) Convexity: Affx, y) is convex in x, but not necessarily in y. 

(iii) Strong Convexity: Iff is 8-strongly convex, then A ffx,y) > |||x - y\\ 2 for all x,y. 
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2.4. Basic assumptions. In the research of present paper, we will make the following assump¬ 
tions: 

Assumption 1. We assume that functions f, g, h, C, (p, if/, if in problem © have the following 
properties: 

(al) (CC T x, x) = ||.r|£ T > cr c ||.\1| 2 , V.r e R m , namely, C is fill row rank; 

(a2) V/z, V(f>, Vi/y, Vi p are Lipshitz continuous; 

(a3) either f or <f>, either g or ifi, and either h or ip are strongly convex; 

(a4) f + g + h is subanalytic, 
where cre and 4 are both positive real numbers. 

In implementation of BADMM ©, the parameter a, and the smooth convex functions <p, ip, 
and ip should be regularized. We further assume Assumption 2: 

4 [(4 + 4 ) 2 + 

a> ---( 12 ) 

b3 CT C 

where //} is the strong convexity coefficient of h or p, and 4 and are respectively the Lipschitz 
coefficient of V/z and Vp. 

We remark that conditions (al)-(a2) above are standard assumptions even for convex settings. 
Condition (a3) is used to guarantee the sufficient descent property of iterates, and condition (a4) 
is a basic assumption assuring that the function L, to be defined in the next section, can satisfy 
the K-L inequality, which in turn will imply the global convergence of the proposed algorithm. 

The smooth convex functions in the Bregman distance are very easily specified; for example, 
take cp(-) = ip(-) = <pf) - ^|| ■ || 2 . Note that if cp is pi-strongly convex, then its Bregman distance 
satisfies 

A^(x,y) > ~\\x-y\\ 2 , (13) 

which follows from Proposition 12.31 


3. Convergence Analysis 


In this section, under the Assumptions 1 and 2 we firstly give a convergence result for the 
BADMM with 3-block procedure ©, and then extend this result to the /V-block (N > 3) case. 
The main results are presented in the subsection 13.41 
For convenience, we first fix the following notations: 

4(4 + 4) 2 


cr 0 = 




acre 


1 

or = 2 mm 


Pl,P2,d3 ~ 




acr c acr c 

u = ( x,y,z),w = ( x,y,z,p),w - ( x,y,z,p,z ), 
u k = (x k ,y k ,z k ),w k = (. x k ,y k ,z k ,p k ),w k = (x k ,y k ,z k ,p k ,z k 

= (IIaII 2 + llvll 2 + lkll 2 ) 1/2 , INI, = ||.Y|| + ||y|| + Ikll, 


1 ), 
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where p\ is the strong convexity coefficient of / or f, and p 2 is the strong convexity coefficient of 
g or il>. Clearly both cr 0 and <r\ are positive by our assumptions. Also, we define a new function 
L : R' !| x R " 2 x R " 3 X R m x R " 3 -» R by 

L(w) = L a (w) + a o||z - z|| 2 - (14) 


3.1. Some lemmas. We establish a series of lemmas to support the proof of convergence of 
BADMM with 3-block procedure ©. 


Lemma 3.1. For each k e N 


II P k+l ~ P k II 2 < 


2(4 + 4) 2 . ■ ■ . - 2f; 


o-c 


iiz* +1 - 


2 

-1 

o-c 


Proof. By our assumptions on C, we have 


||C T (/ +1 - /)|| 2 = <CCV +1 - A / +1 - p k ) > o-c\\p M - /II 2 . 

Applying Fermat’s rule to z-subproblem in ([ 8 ]). we then get 

Vh(z k+ ') + C T (p k + a(Ax k+ ' + By k+1 + Cz k+1 )) + Vy(z* +1 ) - V^(4) = 0. 
Note that p k+l = p k + a(Az k+i + By k+1 + C'z k+l )■ It then follows that 

Vh(z k+ ') + C T p k+1 + Vif(z k+1 ) - V(fi(z k ) = 0, 


so that 

ucV +1 - /)ii 2 

= IIV/;(z* +1 ) - Vh(z k ) + - V<p(z k )) + (V^/- 1 ) - V^/))|| 2 

< (|| Vh(z k+l ) - Vhtf)\\ + IIV^/ +1 ) - V^(4)ll + HV^z*- 1 ) - W)||) 2 

< (4llz* +1 - z*|| + 41k* - z* +1 |i + 41k* - Z k ~ l II) 2 

< 2(4 + 4)V +1 - z*ll 2 + 2 /Ik* - z*- 1 II 2 . 

This together with (fl 6 l) at once yields inequality (fl5T ). 


(15) 


(16) 


(17) 


□ 


Lemma 3.2. For each k e N 

lM") < /,,«/)+(2ALVf _ ^lin^i _ 

V acre 2 ) 

9/2 

+ -^ii/-/-‘ii 2 - -A \ 2 - flip' -/II 2 . 

acr c 2 2 

Proof. First we show that if either / or f is strongly convex, then it follows that 

L a (x k+1 ,/,z k ,p k ) < L a {x k ,y k ,z k ,p k ) - ^-\\x M - 4 || 2 . 

2pi 


(18) 


( 19 ) 
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In fact, if / is strongly convex, then L a {x, y k , z k , p k ) + A^(x, x k ) is strongly convex with modulus 
pi, and thus inequality (fT9l ) follows from (fltil) . Let us now justify the case whenever <p is strongly 
convex. As y k+1 is a minimizer of L a (x,y k , z k , p k ) + A^(x, x k ), we have 

L a (x k+l ,y k ,z k ,p k ) < L a (x,y k ,z k ,p k ) - Afx^', x k ) 

<L a (x,y k ,z k ,p k )-^~ ||Jt * +1 - Jt*|| 2 , 

2pi 

where the last inequality follows from (fl3T ). Similarly, we have 

J c J'+l ,,&+l J< k\ ^ t /-Jc+1 k k\ ^ m, t+1 ,,Am2 

L a yx ,y ,z,p)<L a (x ,y,z,p)-~ — 1 | y -y II 

2p2 

T C 1 A+l A+1 . r / /r+1 /c+1 A' K ^ 11 A+l A'I|2 

Aatx ,y ,z ,p)<L a (x ,y ,z,p )-~—Ik -r||, 

2p3 


and from the last equality in © we have 


T ,,/C+l _/C+l M /C+l\ 7 /._/£+1 -,/C+l _K+I M /C\ , 11 „ K 

Aff(x ,y ,z ,p ) = Affix ,y ,z ,p) + -||p 

a 


Adding up the above formulas, we get 

^ t /..,k 


L a (w ) < Lff(w ) H—||p'" ‘ - p‘ u . 

a 2p\ 


1 


ll/ +1 -/II 2 -— Ik"'* -r 

2p 2 2p3 

This together with (fl5l) yields inequality (fl8l) as desired. 


i+1 J_nx^i-/|| 2 

1 


Jfc+l k m2 


( 20 ) 


Lemma 3.3. For each k e N 


L(v/ +1 ) < L{w k ) - erf ||x * +1 - x *|| 2 + ||y fc+1 - /|| 2 + |k* - / +1 || 2 ). 


Proof. It follows from lcm mas 13. 1 1 and 13,21 that 


j s v k +1 _Jc +1 „k +1 k+l\ t r ^k k k\ 

La\X 5 P ) L a \x ,y ,z ,p ) 


< 


2(4 + 4> 2 _gd |k M _ z * | |2 + il| k *_ z »-l | |2 


acr c 


acre 


- y l |/ +1 - X *|| 2 - y ||/ +1 - /ll 2 . 


^ff(/ +1 ,/ +1 ,/ +1 ,/ +1 ) + <T 0 |/ +1 - /ll 2 

< L a (x k ,y k ,z k ,p k ) + cr 0 ||/ - z * _1 || 2 


P3 2(//j + Iff 2 {[ 


’2 A 


acr c 


acr c 


ll/-/ +1 H 2 


< L a (/,/,y,p k )+<r 0 \\y - / _, n 2 


which implies 
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- <n(||/ +1 - /|| 2 + ll/ +1 - /II 2 + \\z k - Z A+1 II 2 )- 
Then lcmma liOl follows from our notations. □ 

Lemma 3.4. If the sequence {u k } is bounded, then we have 

CO 

< oo. 

k=0 

In particular, the sequence ||h ,a - w k+i || is asymptotically regular, namely, \\w k - vv^ +1 || —> 0 as 
k —> oo. Moreover, any cluster point of w k is a stationary point of the augmented Lagrangian 
function L a defined as in ©. 

Proof We first show that the sequence { w k | is bounded. Indeed we deduce from Eq. (fTTT ) that 

||C T /|| 2 = m(z k ) + V<p(/) - V ^(/ _1 )|| 2 

<(liv/ l (/)n + gi/-/- 1 ||) 2 

< 2(||V/r(z*)|| 2 + (fp\\z k - / -1 || 2 ). 

Since C has full row rank, we have 

cr c \\p k \\ 2 < 2(||V / l (/)|| 2 + ^ 2 ||/ - /- 1 II 2 ). (21) 

Note that [u k ] is bounded. This implies that the sequence {p k } is bounded and so are the se¬ 
quences { w k | and {w k }. 

Since w k is bounded, there exists a subsequence w kj so that it is convergent to some element 
w*. By our hypothesis, the function L is lower semicontinuous, which leads to 

liminf L(w kj ) > L(w*), 

j—*°° 

so that L(w k i) is bounded from below. By Lemma [3731 L(w k ) is nonincreasing, so that L(w k ') is 
a convergent sequence. Moreover L(w k ) is also convergent and L(w k ) > L(w*) for each k. 

Now fix k € N. By Lemma [3731 we have 

o-i 2]di/ +1 - /ii 2 +i/ +l - /ii 2 +ii/ - / +l ii 2 ) 

i= 1 
k 

< Yj Uw‘) - L(w i+l ) = L(w l ) - L(w k+1 ) 

i= 1 

< L(w l ) - L(w*) < oo. 

Moreover, by inequality (fl5T) . we see that Jf^ =0 \\p k - p k+l II 2 < oo. This implies Z/o ~ 
w k+1 |p < oo, and hence ||w* - w k+l \\ —> 0 . 

Let w* = (x*,y*,z*, p*) be any cluster point of iv /; and let w kj be a subsequence of w k converg¬ 
ing to w*. It then follows from algorithm © that 

p k+l = p k + a(Ax k+l + By k+l + Cz k+l ), 
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-5/(/ +1 ) 3 A T p k + aA T (Ax k+l + By k + Cz k ) + V0(./ +1 ) - V0(/) 

= A T p k+l + aA T B(y k - y k+] ) + aA T C(z k - z k+l ) + V0(x* +I ) - V0(A 
-dg(y k+] ) 3 B T p k + aB J (Ax k+l + By k+l + Cz k ) + VtA(/ +1 ) - Vi fr(y k ) 

= B T p k+1 + aB J C(z k - z k+1 ) + Viff{y k+l ) - (/), 

-Vh(z k+] ) = C T p k + aC T (Ax k+l + B/ +1 + Cz* +1 ) + V^(z*) - V^(z A+1 ) 

= C T / +1 + V^(z*) - V^(z A+1 ). 

Since ||n ,A - w/ +1 || tends to zero, letting j —» oo in the above formulas yields 

A V e -df(x*), B^p* e -%(/), 

C T p* = -Vh(z*), Ax* + By* + Cz* = 0, 

which implies that w* is a stationary point of L a . □ 

Lemma 3.5. There exists k> 0 such that for each k 

dist(0, dL(w k+1 )) < k( ||jc* - x k+1 \\ + \\y k -y k+l \\ + Hz* - z A+1 || + \\z k - x* _1 ||). 


Proof. First, we deduce from algorithm © that 

8L x (w k+1 ) = df(x k+l ) + A T p k+1 + aA J (Ax k+l + By k+l + Cz k+l ), (22) 

dLy(w k+1 ) = dg(y k+1 ) + B T p k+i + aB T (Ax k+l + By k+l + Cz k+l ), (23) 

dLfw k+l ) = Vh{z k+l ) + C T p k+l + aC T (Ax k+1 + By k+l + Cz k+1 ) 

+ 2cr 0 (z k+l - z k ), (24) 

dLfz k+l ) = -cr 0 (z k+] - z k ), dL p (z k+l ) = ~(p k+l - p k ). (25) 

a 

Second, we apply Fermat’s rule to algorithm © to get 


0 € d/(/ +1 ) + A T p k + aA T (Ax k+l + By k + Cz k ) + V0(/ +1 ) - V0(A 
0 € dg(y k+l ) + B T p k + aB T (Ax k+1 + By k+l + Cz k ) + ViA(/ +1 ) - Vi/d/), 
Substituting this into (1221) and (1231) . we obtain 

54(w* +1 ) 3 aA J B(y k+1 - /) + aA T C(z* +1 - z*) 

+ V0(jc*) - Vf(x k+1 ) + A 1 (p k+1 - /), 

3LvO^ +1 ) 3 n5 T C(z A+1 - z k ) + B T (p k+l - p k ) 

+ V<A(/) - Vf(y k+l ). 

We also substitute ( fT71 ) into (1241) to get 

<9L z (w* +1 ) = V^(z*) - V^(z i+1 ) + C T (/ +1 - p k ) + 2cr 0 (z i+1 - z k ). 


where the last equality follows from ©. 
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As Vcf>, Vi/r, Vip are all Lipshitz continuous and matrices A,B,C are all bounded, the above 
series of estimations show that there exists kq > 0 such that 

dist(0, dL(w k+1 )) < a:o(||jc* - je* +1 1| + || y k+1 - y k \\ + ||z fe+1 - z k \\ + || p k+l - p k \\). (26) 

On the other hand, it follows from Lemma [3TT1 that 

II/" 1 -/II < ^^ ) ||/" I -/Il + ^|/-/-Il (27) 

yo- c yjerc 

< ^^2®“-/» + !/_/-■»). (28) 

Letting k\ := V2(4 + i v )/ -fire, we then have 

II P k+l ~ p k II < ^i(lk* +1 - z*|| + | \z k - z* _1 ||). (29) 

Let k := (/ri + I )( k [) + 1). Hence Lemma l331 follows immediately. □ 

3.2. Convergence analysis. 

Theorem 3.6. Under the Assumptions 1 and 2, if the sequence {u k | is bounded, then 

OO 

2 11^Hi < °°- 

k =0 

In particular, the sequence {n^} converges to a stationary point of L a defined as in ([6]). 

Proof From the proof of Lemma [3~4l we see that the sequence \w k \ is bounded. Let Q stand for 
the cluster point set of w k . Take any vv* € Q and let be a subsequence of w k converging to 
w*. Since by Lemma 1331 the sequence L(w k ) is convergent, it follows that 

L(w*) - lim L(w kj ) - lim L{w k ), 

j —>oo k—>oo 

so that the function L(-) is a constant on Q. 

Let us now consider two possible cases on L(w k ). First assume that there exists ko £ N such 
that L(w k °) - L(w*). Then we deduce from Lemma 1331 that for any k > ko 

cri\\w k+l - w k f < L(w k ) - L(w k+l ) < L(w ko ) - L(w) = 0, 

where we have used the fact that L{w k ) is nonincreasing. This together with ( 1261) implies that 
(w k ) is a constant sequence except for finite terms, and thus the proof is finished in this case. 

Let us now assume that L(yv k ) > L(w*) for each k e N. By Assumption 1, It is easy to know 
that L(-) is a subanalytic function and thus satisfies the K-L inequality. Then by Lemma lT2l there 
exists q > 0 ,6 > 0, (f £ such that 

<p'(L(w) - L(w*))dist(0, dL(yv)) > 1. 

for all w satisfying dist(w, il) < 6 and L(w*) < L(w ) < L(w*) + q. By definition of Q we have 
lim,- disfivv^, Q) = 0. This together with the fact L(w k ) —* L(w*) implies that there exists k\ 6 N 
such that dist(#, II) < 5 and L(w k ) < L(w*) + q for all k > k\. 
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Let us fix k > k\ in the following. Then the K-L inequality 

dist(0, dL(w k ))tpr{L(w k ) - L{w)) > 1 
holds, which along with Lemma l331 then yields 

- . ' . -< dist(0, dL(w k+1 )) 

ipr(L(w k ) - L(w*)) 

< 4ll/ - Z" 1 II + 11/ - /- 1 1| + || z k - Z" 1 II + ||/" 2 - Z" 1 II). 

By Lemma lT2l the last inequality and the concavity of p> show 
o-i\\w k+x - w k || 2 < L(w k ) - L(w k+1 ) 

= ( L(w k ) - L(w )) - ( L(w k+1 ) - L(w )) 

< ip(L(w k ) - L(w*)) - p(L(w k+l ) - L(w*)) 

<pr(L(w k ) - L(w*)) 

< 411/ - /“'ll + 11/ - /“'ll + 1/ - Z" 1 II + II/- 2 - /- 1 !!) 
x \p(L(w k ) - L(w*)) - ip(L(w k+l ) - L(w*))], 

or, equivalently, 

||/ +1 - /|| 2 + ||/ +1 - /|| 2 + ||/ +1 - /|| 2 

< —(||/ - /-/I + 11/ - /-‘|| + ||/ - z* _1 II + ||/ -2 - z* _1 ||) 
O" 1 

x |^(L(/) - L(w*)) - if(L(w k+l ) - L(W*))1 

We thus have 

3(||/ - / +1 || + ||/ - / +1 || + ||/ +1 -/II) 

< 3 V3(||/ +1 - /|| 2 + ||/ +1 - /|| 2 + ||/ +1 - /II 2 ) 1 / 2 

< 2(||/ - /"'ll + 11/ - /"'ll + ||/ - z A_1 1| + ||/- 2 - / _1 ||) I/2 

X ^ - £(**)) - - £(w*))] 1/2 . 

On the other hand, we observe that 

2(||/ - / _1 1| + ll/-/- 1 !! + 11/ -/^ll +1|/- 2 -z^W) 112 

< ||/ - /- 1 || + 11/ -/"‘ll + ||/ -/"‘II + ||/ -2 -/ _1 || 

+ ^\p(L(w k ) - L(w*)) - <p(L(w k+1 ) - L(w*))l 
4cri 

which along with (l30l) yields 

3(||/ - / +1 || + 11/ - / +1 || + || z k+1 - /||) 


(30) 
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< ||/ - /“'ll + ||/ -/- 1 || + ||/ - Z " 1 II + ||/“ 2 - /“'ll 
+ - L(w*)) - y{L(w k+l ) - L(w*))l 


Hence we have 


k 

£ 3(||/ - / +1 || + 1/ - / +1 || + |/ - / +1 ||) 

i=k\ 

k 

< £di/ - / _1 n + ii/ - /- 1 ii +1/ - / -1 n +1/- 1 - /- 2 id 

i=k\ 

+ T~ ~ 1 <**» - - L(w))]. 

ReaiTanging terms in the above inequality, we obtain 

k k k 

2 £ 11/ - / +1 II + 2 £ 11/ - / +1 II + £ 11/ - Z M II 

i=k\ i=k\ i=k\ 

k 

<£(ii/-/ _1 ii-ii/-/ +1 ii) 

i=k\ 

k 

+ £(ii/-/“ 1 n-ii/-/ + 1 ii) 

i=k\ 

k 

+ £(ii/-/ _ 1 ii-ii/-/ + 1 ii) 

i=k\ 

k 

+ £(ii/- 1 -/ _2 ii - ii/ -/ +1 id 

i=k\ 

r \ r i k 

+ T~ YM^m - L(w*)) - <p(L(w i+1 ) - L(w *))] 

= ||/ 1_1 - /‘II - ||/ - / +1 || + ||/ 1-1 - /‘II - 11/ - / +1 || 

+ 11/ 1 " 1 - / 1_2 || + 2Hz* 1 - / 1_1 || - 11/ - z*-‘|| - 211/ - z* +1 || 

+ /^(L(i/) - L(w*)) - ip(L(w k+l ) - L(w*))] 

4ixi 

< II /- 1 - /‘|| + Ii / 1-1 - />|| + l / 1 " 1 - z kl ~ 2 \\ 

+ 2Hz* 1 - /‘/I + ^cp(L(w°) - L(w )) 

4o"i 

where the last inequality follows from the fact that ip(L(w k+1 ) - L(w*)) > 0. Since k is chosen 
arbitrarily, we deduce that Z*L 0 (||/ - / +1 || + ||/ - / +1 || + ||/ - z* +1 II) < By inequality 



15 


(l29l ). it then implies that HP* _ /?' +1 ll < 00 . from which \\w k - w k+l || < oo follows. 
Consequently |v/} is a convergent sequence. This completes the proof of Theorem 13.61 □ 


3.3. Boundedness. In the previous theorem, we have assumed the boundedness of the sequence 
{«*}. This assumption is not restrictive in general. There arc actually various sufficient conditions 
ensuring the boundedness of the sequence {rq}. We present such a sufficient condition below. 


Theorem 3.7. If(al)-(a3) in Assumption 1 hold and the following (bl)-(b4) are satisfied: 

(bl) inf f(x) = f* > -oo, inf g(y) = g* > -oo and there exists (3 q > 0 such that inf{/i(z) - 
A)l|V/?(z)|| 2 } = h* > -oo; 

(b2) f(x) + g(y) is coercive, namely, lim m i n( | W |, M) _»oo f(x) + g(y) = +oo; 

(b3) either h(z) - /?oll V/z(z)|| 2 is coercive or C is square; 

(b4) a > OfQ where, 

( 2 4[(4+4) 2 +^A ,,2 ■ 

max I - ^ lTc J, if h(z) - fioW^HzW is coercive 

00 = m / -,_ i m 2 l p 4 [( 4 + 4 ) 2 +^]\ 

||C If max Uh, -—-— I, if C is square ; 

then the sequence {if} is bounded. 


Proof First we deduce from Eq. (I2TT) that 

-ll/ll 2 < — IIV/i(z *)|| 2 + cr 0 \\z k -Z k ~ l \\ 2 , 
a acre 

which together with the definition of L gets 

i k 

L(w k ) = f(x k ) + g(y k ) + h(z k ) - -||p "|| 2 + <ro||z* - Z^ll 2 + dl Ax k + By k + Cz k + -|| 2 

a 2 a 

> /(**) + g(y k ) + Kz k ) - — ||V/i(z')|| 2 + £|| Ax k + By k + Cz k + —1| 2 

acre 2 a 


> ftf) + giy ) + h(z k )-/3 0 m(z k )\\ 2 + -||Ad + By k + Cz* + t-\ 

2 a 

where /?o is any constant such that inf{/?(z) - A)IIV/i(z)|| 2 } > -oo and h(z) - /3oI|V/j(z )|| 2 keeps 

coercive no matter whether C is regular or not. Thus from the monotonically decreasing property 

of {Liyif)}, we obtain 

Lfw 1 ) > /(/) + g(y k ) + h(z k ) - A)IIV/7 (z')|| 2 , 


which then implies 

f(x k ) + g(y k )<L(w l )-h*<o o 
and 

/j(z') -A>IIV/j(z ')|| 2 < L(w x ) -f*-g* < oo. 

By condition (b2), this yields the boundedness of jd) and {_y A }, and the boundedness of {z k } as 
well whenever h{z) - /3’o||V/;(z )|| 2 is coercive. 
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Similarly, from Lemma [33} we can obtain 

o-tllz* - / _ 1 II 2 < L{w l ) - if + g* + li ) := Mi < oo, (31) 

which shows the boundedness of {||/ - / _1 ||). Now, let us assume that the function li(z) - 
j 8 b||VA(z )|| 2 is not coercive but the matrix C keeps nonsingular. We then justify the boundedness 
of {z^} in this case. In effect, by using again Lemma [3731 and inequality (I3TT) . we get 


||Ajc* + By k + Cz k + —II < 
a 


M\a 

~T~’ 


and using the inequality 

k 


llAx* + By k + Cz k + —|| > ||C/|| - IIAjc* + By k \\ - -\\p k \\, 
a a 


we then have 


l|C/|| - -||p"|| < 

a 


M\a 


+ Mi, 


where Mi := sup \\Ax k + By k \\. It thus follows from Eq. ( 1771 ) and condition (c3) that 

\\p k \\ < IKC^^iincVll - llc-^iiicVii 

<||C- 1 ||||V/ I (z') + V^(z')-V^(z'- 1 )|| 

< ||C-‘ IKH' Vh(z k )\\+{ v \\z k -z k - 1 1|). 

With any fixed z*, we clearly have 

l|V/i(/)|| = || Vh(z k ) - Vh(z*)\\ + ||VA(z*)|| 

<th\\z k -z*\\ + m(z*)\\ 


< 4(11/11 + 11/11) + ||VA(z* 


and furthermore, 


Hence we have 


IIp'II < lie- 1 1 | {4(11/11 + ll/ll) + IIV/r(/)|| + 411 / - / -1 ||} • 


IIC/II - -Up'll > 1|/|| - -Up'll 

a IIC- 1 !! a 


> 


> 


1 


IIC- 1 !! 

l 


YttII/II - {4(11/11 + ll/ll) + IIV/i(/)|| + 411/ - z'-'ll} 


a 


Jic-'n 

which together with (1321) implies 
1 


l|c ~' 114 Vu _ !!!—!!«, Hz*, + nv/Kz*)!!, -!!—!!L llz t _ z~'\ 


a 


a 


l|C-‘ll 


" c_ 1 ||£ ‘Vii. 


a 


(32) 


M1^ + m 2 + ^-^(411/11 + l|V/r(/)||) + ^-V - z'-‘| 


< 


M i a 


a 

-h 


a 


+ M 2 + 


IIC" 1 !! 


a 


411/11 + HV/7(/)ii + 4 J— 
V 0-1 
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where the last inequality follows from ( |3TT ). By condition (b4), the sequence {z^} is then bounded, 
and so is the sequence {ir}. □ 

Remark 1. It is easy to see that function h(x) = \\Ax - b\\ 2 for any matrix A and b satisfies 
conditions (bl) and (b3) with /3q = ^j-. 

3.4. Main results. 

Combining theorems 13.61 and 13.71 we present the following convergence theorem for the 
BADMM with 3-block procedure ©. 

Theorem 3.8. If Assumption 1 and conditions (bl)-(b4) in Theorem 13. 71 are satisfied, then the 
sequence { w 1 ' ( generated by procedure © converges to a stationary point ofL a defined as in ©. 


We now extend this result to the /V-block case. Thus, let us consider the following composite 
optimization problem: 


min fix]) + / 2 (x 2 ) + ■■■ /n{xn) 
s.t. A{X[ + A 2 x 2 + • • • + AnXn — 0, 


(33) 


where A, e R mx "\ f : R"' —> R, i - 1,2, • • • , N - 1 are proper lower semicontinuous functions, 
and f N : R'' v —> R is a smooth function. The associated BADMM algorithm takes the form: 

' x k+\ _ arg min La ( Xl> ^... , , p k ) + (xi, x\) 

: = : : (34) 

x'f 1 = arg min L a (x k+ \■ • • , x k A' , x N , p k ) + A ^ n (x n , x k ) 

x n eR!'n 

p k +l _ pk + + A 2 X k+ 1 H -+ A^X^ 1 ) 

where A^., i — 1,2, • ■ • , N are the Bregman distances associated with functions f, and the corre¬ 
sponding Lagrangian function L a : R" x R” 2 x • • • x R" v x R" 1 —> R is defined by 

N N a N 

L a (x i,x 2 ■■■ ,x N ,p):=^ fi(xi) + p,A,-x f > + -|| ^ AjXi\\ 2 . (35) 

i=l i=l (=1 

It is then straightforward to establish a similar convergence result with Theorem 13. 8 1 


Theorem 3.9. If the following (dl)-(d7) are satisfied: 

(dl) (A^AJjX, x) = ||x|| 2 T > cr Aiv ||x|| 2 , Vx e R' ,,v , namely, An is full row rank; 

(d2) V/|v, V<pi, i = 1,2, • ■ • , N are Lipschitz continuous; 

(d3) either f or tf, i = 1,2, • ■ • , N is strongly convex; 

(d4) /1 + f 2 + ■ ■ • + fy is subanalytic and coercive; 

(d5) inf fi = f* > -00, i = 1,2, ■ • ■ , N - 1, and there exists fio > 0 such that infj/lvfx.v) - 
/?oI|V/at(x w )|| 2 } = ff> -00; 

(d6) either f^ -j8ol|V/iv|| 2 is coercive, or An is square; 



18 


(d7) a > ao where, 

r 

max 

a 0 = 

-1 m2 




ho 0"A«, 


IlA^ 1 1|“ max 


Mno~a n 

4[(f/ y +4 w ) 2 +4: 

rtv 


if In ~ A)I|V/jvII 2 « coercive, 
if An is square'. 


where p,v /'v //n? strong convexity coefficient of fy or <pn, and lf N and ( ( / >N are respectively 
the Lipschitz coefficient of V/jy mid V(/),v, 

then the sequence [x k v xf ■ ■ ■ , x J f, p k } converges to a stationary point ofL a defined as in (1351 ). 

Remark 2. We notice that whenever any fi is strongly convex, the function (f in the Bregman 
distance can be taken as zero in the /-th update of procedure (1341 ). 

Remark 3. For convenience of applications, we list some specifications of Theorem 13.91 as fol¬ 
lows. 

(i) Underdetermined linear system of equations: In this case, f = 0, i = 1,2, ■■■ ,N, and 
m < | n i- The problem ( 1331) is degenerated to 

min 0 

(36) 

s.t. A]X] + A 2 X 2 H-+ AnXn = 0 

which amounts to solving the underdetermined linear system of equations: 

Ax = 0 (37) 

where A = [Ai,A 2 , ••• , A ,.y | and x = [x|, ■ ■ ■ , x^,] T . In this case, the BADMM algorithm 

takes the form: 


Jt+i 


= arg min %\\A[X\ + A 2 xf. + 


jcieR' 1 ! 


+ 


A NX k N + ^|| 2 + A 01 (xi,^) 


• " t (38) 

x ^ +1 = arg min %\\Aix k , +l + • • • + A N - ixfff, + A N x N + ^-|| 2 + A ^ n (x n , x k ) 

xn eR n N - 

pk+\ _ pk + 1 + A 2 -4 +1 4-+ A 1 ). 

We easily check that in this special case all the assumptions in Theorem [T9] are met whenever 
the matrix An is nonsingular. So, by Theorem 13.91 the procedure (1381 ) can converge to a point 
(jtj, x*, , x* N , p*). The point (x*, x* v ■■■ , x* N ) is clearly a solution of (1371 ) by the last equation in 

(1381) . We notice that the same problem was studied by Sun, Luo and Ye fiDl . and they considered 
the case that A is a square nonsingular matrix. To solve the linear system of equations, they 
suggested a novel randomly permuted ADMM and proved its expected convergence. 

(ii) Two blocks case: N = 2. It is easily seen that Theorem 13. 9 1 in this case is degenerated to 
convergence of the conventional BADMM procedure: 

x (' +1 = arg min L a (x\, xt,, p k ) + AaJxi, x\) 

1 xi eR"i z r 1 

( 39 ) 


4 +1 = arg min L a (x k , +1 , x 2 , p k ) + A^(x 2 , x k ) 

^ 2 eR " 2 

p k+] =p k + a(A lX k+1 + A 2 4 +1 ) 








for the problem: 
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min fi(x i) + f 2 (x 2 ) 
s.t. A\x\ +A 2 X 2 = 0. 

Thus, Theorem [T9] includes the results established in ||3T1[44| as special cases, 
(iii) The unconstrained minimization case: 


( 40 ) 


min/, (j:,) + f 2 {x 2 ) + ■■■ f N (x N ) 


(41) 


where /• : R"' —» R, i = 1,2, ■ ■ ■ ,N - 1 are proper lower semicontinuous functions, and 
fff : R' !iV —> R is a smooth function. Even no constraint exists in this case, a similar Breg- 
man alternative direction method (BADM) can be defined as follows: 


Jfc+i 

■*1 


= arg min f\(x\) + A^ix^x]) 

JCl€R"l 


yk+ 1 

Av 


= arg min 

X N €R n N 


In(Xn) + A0 n (x N , X k N ). 


(42) 


Following exactly the procedure of proof of Theorems 13.61 and 13.71 we can immediately obtain 
the following convergence of (1421) in the setting that: 

(el) inf f = f* > -00, i = 1,2, • • • , N; 

(e2) V/v, V/;, i = 1,2, • ■ • , N are Lipschitz continuous; 

(e3) either /• or i — 1,2, ■ • ■ , N is strongly convex; 

(e4) /1 + f 2 + • ■ • + /n is subanalytic and coercive. 


4. Demonstration examples 


In this section, a simulated example and a real-world application are provided to support the 
correctness of convergence of the proposed 3-block Bregman ADMM for solving non-convex 
composite problems. 

Consider the non-convex optimization problem with 3-block variables deduced from matrix 
decomposition applications (see e.g. |[3l l46l[57l ): 


rmn||L||, + A||S||Jg + ^||T-M||2 

L,d, 1 Z 

s.t. T = L + S, 


(43) 


where M, T, L and S are all m x n matrices, M is a given observation, T is an ideal observation, 
||L||* := fr .(L) is the nuclear norm of L, ||S||j^ := H'=i Z]=i |S,y| 1/2 is the € 1/2 quasi¬ 

norm of S, A is a trade-off parameter between the spectral sparsity term ||L||* and the element¬ 
wise sparsity term ||S||j^> and // is a parameter associated with the noise level. The augmented 
Lagrange function of this optimization problem is given by 

Lai L, s, T, A) = IILIU + 4||S!i;; 2 + ^\\T - M|| 2 + (p, T - (L + S)> + |||T - (L + S)|| 2 . (44) 
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According to the 3-block BADMM ©, the optimization problem (1431) can be solved by the 
following procedure 

' L k+l = arg min L a ( L, S k , T k , \ k ) + a*(L, L*) 

L 

S* +1 = arg min L a (L k+1 , S, I*. A k ) + A*(S, S*) 

s (45) 

T l+1 = arg min L a (L k+l ,S k+l , T, A*) + A^(T, T*) 

T 

pt+t _ p k + a(T k + ] _ (L *+i + gfc+i)). 


Specifying 0(-) = ifj(-) = y || • || 2 , <p(-) = ^|| • |p and substituting these formulations into the 
procedure (l45l) . we then obtain the following closed-form iterative formulas of (1451) : 


= •Sm(- 


Q'+yi 


- -23-) 

’ a+y\ J 


c*+l = <u ^(T A -L A+1 + V) + y lS * _J_ 
E'- a+yi 9 a+yi' 

T k+ 1 = /iM+ ff (L t+1 +S A - +1 -^)+y 2 T A 
fi+a+y2 

pt+t _ p k + a(T k+\ 


(] L k+l + S k+l )) 


jt+A 


(46) 


where Sm( A, •) indicates the operation of thresholding the singular values of matrix A using the 
well-known soft shrinkage operator, and < He( A, ■) the operation of thresholding the entries of 
matrix A using the half shrinkage operator I49ll50ll5lll52l . The procedure (|46| ) is the specifica¬ 
tion of BADMM dH]) for the solution of problem (1431) with functions fix), g(y), li{z) defined by 
/(L) = ||L||*, g(S) = d||S||]^, h{ T) = ^||T - M|| 2 and matrices A, B, C defined by A = I, B = -I, 
C = -I where I is the identity matrix. It is direct to see that all the assumptions of Theorem 13. 8 1 
are satisfied. Consequently, Theorem l3.8l can be applied to predict convergence of ( l46l) in theory. 
We conduct a simulation study and an application example below for support of such theoretical 
assertion. 

We first expatiate some implementation issues. We set y\ = a and j 2 = a + q in (l46l) . 
In order to avoid the tediousness of tuning the parameter a, we exploit a dynamic updating 
scheme, e.g., a = min(a * l.l,or maA ), where a max is a very large constant. Due to the non¬ 
convexity of this optimization problem it is very important to choose a suitable initialization. In 
the following experiments, we initialized matrix L by the best rank r approximation of matrix 
M, i.e., L = SVD(M, r), where r was empirically set as ceil(0.01 • min(m, ri))\ initialized matrix 
S as one zero matrix of size m x n: and then initialized matrix T = L + S. Finally, we terminated 
the algorithm by the criterion relChg < le-8, where relChg is defined as 


relChg := 


ll[L 


k +1 


L\ S 


ir+i 


S\T 


<k+l 




l|[L*, S*, T*]||/r + 1 

(a) Simulation study. To check the validity of model (1431) and the convergence of proce¬ 
dure (l46l) . we generated an observation matrix M from given L and S (namely, the true solution) 
with Gaussian random disturbance N, and then we applied procedure (1461 to recover L and S. 
The square matrices of size m x m are randomly generated for our simulations. The matrix L 
was taken as UV 7 , where U and V are independent m x r matrices whose elements are i.i.d. 
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(a) (j = 0 



(b) cr = 0.2 


Figure 1: Separation results in simulated data. 


Gaussian random variables with zero mean and unit variance, and S taken as a sparse matrix 
whose support was chosen uniformly at random with the entries uniformly specified in the inter¬ 
val [-50,50]. Then, the measurement M was generated as M = L + S + N, where matrix N is 
Gaussian noise with mean zero and variance cr 2 . Thus, cr = 0 corresponds to the no noise case 
and cr ^ 0 corresponds to the noisy case. In simulations, the parameter // in model (1431) was set 
as a large value le+4 in the no noise setting, and a value in the noisy setting from a candidate set 
such that the proposed algorithm has the best performance. The parameter A was empirically set 
as the value ma ^ n n) ■ The performance of the algorithm is then measured in terms of the relative 
error defined by 


re 1 Err A 


l|A-A*|| F 

I|A*||f 


where A indicates the recovery result of the algorithm, and A* indicates the true result. 

With the above settings and measure, our simulation results are then shown in Figure Q] In 
Figure [Ha), they are exhibited the curves of the relative error relErrA (A := L, S,T) and the 
relative change relChg with respect to the iterative steps when no Gaussian noise is added, and 
in Figure [Hb) the curves when Gaussian noise is added with mean 0 and variance cr 2 = 0.2 2 . 
From these curves, it can be seen that under the initialization in terms of the relative error and 
the relative change the procedure (l46l) does converge, as predicted. 

(b) An application example. We further applied the model (1431) with BADMM (l46l) to the 
background subtraction application. Background subtraction @ is a fundamental task in the 
field of video surveillance. Its aim is to subtract the background from a video clip and meanwhile 
detect the anomalies (i.e., moving objects). From the webpage we first download four video 
clips: Fobby, Bootstrap, Flail, and ShoppingMall. Then we chose 600 frames from each video 
clip and input these 600 frames into our algorithm. The parameter A was set as the value ———- 
In Figure |2j we exhibit the separation results of some frames in four video clips. From Figure [2j 
it can be seen that our algorithm can produce a clean video background and meanwhile detect 


9 

"http://perception.i2r.a-star.edu.sg/bk_model/bk_index 
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(a) Lobby 



(b) Bootstrap 



(d) ShoppingMall 


Figure 2: Separation results in real-world video clips. 


a satisfactory video foreground, which supports the validity and convergence of the proposed 
BADMM. 
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